Binary Neural Networks Algorithms, Architectures, and Applications (Baochang Zhang, Sheng Xu, Mingbao Lin etc.)

108

Binary Neural Architecture Search

12JO\KXMKTIK

෨ࢌ^ࡼሺܟǡ ߙሻ

෪

ࢌ܊

࡯ሺෝܟǡ ොߙǡ ߚሻ

6GXKTZ

)NORJ

9KGXIN9VGIK

:GTMKTZ

VXUVGMGZOUT

ࡸࡾ

)XUYYKTZXUV_

ߙ՚ ොߙ

*KIU[VRKJ

UVZOSO`GZOUT

*KIU[VRKJ

UVZOSO`GZOUT

FIGURE 4.12

The main framework of the Discrepant Child-Parent model. In orange, we show the critical

novelty of DCP-NAS, i.e., tangent propagation and decoupled optimization.

architectures with binarized weights and activations, which consider both real-valued archi-

tectures and binarized architectures.

4.4.3

Search Space

We search for computation cells as the building blocks of the ﬁnal architecture. As in

[305, 307, 151] and Fig. 4.13, we construct the network with a predeﬁned number of cells, and

each cell is a fully connected directed acyclic graph (DAG) G with N nodes. For simplicity,

we assume that each cell only takes the outputs of the two previous cells as input, and

each input node has pre-deﬁned convolutional operations for preprocessing. Each node j is

obtained by

a⁽^j⁾=

i<j

o⁽^i,j⁾(a⁽ⁱ⁾)

o⁽^i,j⁾(aⁱ) = w⁽^i,j⁾⊗aⁱ,

(4.27)

where i is the dependent nodes of j with the constraints i < j to avoid cycles in a cell,

and a^jis the output of the node j. w⁽^i,j⁾denotes the weights of the convolution operation

between the i-th and j-th nodes, and ⊗denotes the convolution operation. Each node is a

speciﬁc tensor like a feature map, and each directed edge (i, j) denotes an operation o⁽^i,j⁾(.),

which is sampled from the following M = 8 operations:

FIGURE 4.13

The cell architecture for DCP-NAS. One cell includes 2 input nodes, 4 intermediate nodes,

and 14 edges.